- This is about the general method for making these types of inferences.
- I don’t actually know the details of facebook’s particular implementation.
- Or how buzzfeed grades their quizzes.
target
New York Times
https://theoutline.com/post/969/did-trump-win-psychometrics-data-cambridge-analytica
“So what does all this have to do with elections? In 2014, a young assistant professor named Aleksandr Kogan requested access to Kosinski’s database on behalf of an “election management agency” based in London called Strategic Communications Laboratories. Kosinski turned Kogan down, but Kogan went ahead and registered a company under SCL’s umbrella called Cambridge Analytica — an homage, he said, to the university’s work in the field. Cambridge Analytica, under CEO Alexander Nix, went on to work for the pro-Brexit campaign, Senator Ted Cruz’s presidential nomination bid, and then on Donald Trump’s presidential campaign.”
“On the day of the third presidential debate between Trump and Clinton, Trump’s team tested 175,000 different ad variations for his arguments”
You take a quiz, and get a 78%. What does this mean?
According to Stevens (1946), measurement is “the assignment of numerals according to rules”
According to Wright (1997), measures should be:
unidimensional
Messy notation:
What should a curve relating the abilities and probabilities look like? (Use ability as the independent variable)
2
\(\theta_A\) - \(\theta_B\) should be meaningful, and should not depend on the items used (specific objectivity)
Now, maybe \(\ln(D_{Ai})-\ln(D_{Bi}) = \theta_A - \theta_B\), or equivalently, \(\ln \left( \frac{P_{Ai}}{1-P_{Ai}} \right)-\ln \left( \frac{P_{Bi}}{1-P_{Bi}} \right) = \theta_A - \theta_B\)
By similar logic to the last slide, it is reasonable that \[ b_m-b_n = \ln \left( \frac{P_{Am}}{1-P_{Am}} \right) - \ln \left( \frac{P_{An}}{1-P_{An}} \right) \]
Now difficulty and ability are on the same scale
\(\theta_A - \theta_B = \ln \left( \frac{P_{Ai}}{1-P_{Ai}} \right)-\ln \left( \frac{P_{Bi}}{1-P_{Bi}} \right)\) means \(\theta_A = \ln \left( \frac{P_{Ai}}{1-P_{Ai}} \right) + C_1\)
and
\(b_m-b_n = \ln \left( \frac{P_{Am}}{1-P_{Am}} \right) - \ln \left( \frac{P_{An}}{1-P_{An}} \right)\) means \(b_m = \ln \left( \frac{P_{Am}}{1-P_{Am}} \right) + C_2\)
This means \(C_1 = b_m\) and \(C_2 = \theta_A\). Therefore, \(\theta_A - b_m = \ln \left( \frac{P_{Am}}{1-P_{Am}} \right)\)
Solve for \(P_{Am} = \dfrac{exp(\theta_A-b_m)}{1+exp(\theta_A-b_m)}\)
\[ P_{i,j} = \dfrac{e^{\theta_i-b_j}}{1+e^{\theta_i-b_j}} \]
This is the probability that person \(i\) will correctly answer question \(j\) correctly.
## Dffclt Dscrmn P(x=1|z=0) ## few_moments_alone -2.393858933 1 9.163578e-01 ## cautious -1.472960963 1 8.135070e-01 ## drained_after_out -1.131583530 1 7.561310e-01 ## enjoy_large_gather -1.131124476 1 7.560464e-01 ## work_alone -0.542596614 1 6.324162e-01 ## quiet_child -0.006237960 1 5.015595e-01 ## large crowd -0.006014405 1 5.015036e-01 ## risk_careful_research 0.259495753 1 4.354877e-01 ## space_myself 0.531349428 1 3.702022e-01 ## every_moment_alone 25.566068525 1 7.884924e-12
## Dffclt Dscrmn P(x=1|z=0) ## few_moments_alone -2.908212570 0.7938469 9.095930e-01 ## cautious -1.780270664 0.7938469 8.042800e-01 ## enjoy_large_gather -1.364893865 0.7938469 7.471589e-01 ## drained_after_out -1.364760881 0.7938469 7.471390e-01 ## work_alone -0.650823089 0.7938469 6.263650e-01 ## large crowd -0.003267129 0.7938469 5.006484e-01 ## quiet_child -0.002924083 0.7938469 5.005803e-01 ## risk_careful_research 0.317258943 0.7938469 4.373670e-01 ## space_myself 0.645493587 0.7938469 3.746257e-01 ## every_moment_alone 32.205290267 0.7938469 7.884924e-12
## Dffclt Dscrmn P(x=1|z=0) ## drained_after_out -4.979436e+00 1.942927e-01 7.246145e-01 ## few_moments_alone -2.487551e+00 9.835886e-01 9.203217e-01 ## work_alone -4.576220e-01 1.347980e+00 6.495053e-01 ## quiet_child 5.307843e-03 1.933437e+00 4.974344e-01 ## large crowd 1.068790e-02 2.401956e+00 4.935824e-01 ## space_myself 3.577605e-01 3.371001e+01 5.785727e-06 ## risk_careful_research 4.701684e-01 4.833434e-01 4.434300e-01 ## enjoy_large_gather 7.909974e-01 -1.910641e+00 8.192555e-01 ## cautious 7.296065e+00 -1.723285e-01 7.785644e-01 ## every_moment_alone 1.666318e+16 3.934787e-15 3.349795e-29
## Gussng Dffclt Dscrmn ## drained_after_out 1.803486e-04 -4.506472e+00 2.150155e-01 ## few_moments_alone 1.997998e-01 -2.151861e+00 1.009364e+00 ## work_alone 7.858138e-16 -4.654201e-01 1.306085e+00 ## quiet_child 3.392461e-94 6.801553e-03 2.009522e+00 ## large crowd 0.000000e+00 1.222585e-02 2.503847e+00 ## risk_careful_research 3.273587e-59 4.073067e-01 5.699412e-01 ## enjoy_large_gather 2.000000e-01 4.084517e-01 -2.892699e+00 ## space_myself 0.000000e+00 4.376223e-01 9.762449e+03 ## cautious 1.067525e-01 1.235845e+01 -8.942770e-02 ## every_moment_alone 0.000000e+00 1.077452e+08 2.372830e-07 ## P(x=1|z=0) ## drained_after_out 7.249620e-01 ## few_moments_alone 9.181457e-01 ## work_alone 6.474567e-01 ## quiet_child 4.965831e-01 ## large crowd 4.923477e-01 ## risk_careful_research 4.422240e-01 ## enjoy_large_gather 8.121779e-01 ## space_myself 0.000000e+00 ## cautious 7.777877e-01 ## every_moment_alone 7.884739e-12
We could either compare the models statistically, or try to make sense of which model might be reasonable.
What model might be good, and why?
## Dffclt Dscrmn ## few_moments_alone -2.393858933 1 ## cautious -1.472960963 1 ## drained_after_out -1.131583530 1 ## enjoy_large_gather -1.131124476 1 ## work_alone -0.542596614 1 ## quiet_child -0.006237960 1 ## large crowd -0.006014405 1 ## risk_careful_research 0.259495753 1 ## space_myself 0.531349428 1 ## every_moment_alone 25.566068525 1
This means the curve for “large_crowd” is:
\[\frac{exp(\theta+0.006014405)}{1+exp(\theta+0.006014405)}\] etc.
Suppose I said “agree” to everything before “enjoy_large_gather” and “disagree” to everything after
I’m indicating that I follow the curve for the early ones, but not the later ones.
Suppose we have functions \(p_n(\theta) = \frac{exp(\theta - \beta_n)}{1+exp(\theta-\beta_n)}\)
The “likelihood function” for my introversion is \(p_1(\theta) \times p_2(\theta) \times p_3(\theta) \times p_4(\theta) \times (1-p_5(\theta)) \times (1-p_6(\theta)) \times (1-p_7(\theta)) \times (1-p_8(\theta)) \times (1-p_9(\theta)) \times (1-p_{10}(\theta))\)
What does this actually look like???
## [1] -1.2888267 -0.5864932 -0.9332466 -0.5864932 0.4855150 -0.9332466 ## [7] -0.2404238 -0.2404238 0.1135676 -0.2404238 -0.2404238 0.8888117 ## [13] 0.4855150 0.1135676 0.4855150 0.1135676 0.8888117 1.3411824